最近,检测变压器(DETR)是一种端到端对象检测管道,已达到有希望的性能。但是,它需要大规模标记的数据,并遭受域移位,尤其是当目标域中没有标记的数据时。为了解决这个问题,我们根据平均教师框架MTTRANS提出了一个端到端的跨域检测变压器,该变压器可以通过伪标签充分利用对象检测训练中未标记的目标域数据和在域之间的传输知识中的传输知识。我们进一步提出了综合的多级特征对齐方式,以改善由平均教师框架生成的伪标签,利用跨尺度的自我注意事项机制在可变形的DETR中。图像和对象特征在本地,全局和实例级别与基于域查询的特征对齐(DQFA),基于BI级的基于图形的原型对齐(BGPA)和Wine-Wise图像特征对齐(TIFA)对齐。另一方面,未标记的目标域数据伪标记,可用于平均教师框架的对象检测训练,可以导致更好的特征提取和对齐。因此,可以根据变压器的架构对迭代和相互优化的平均教师框架和全面的多层次特征对齐。广泛的实验表明,我们提出的方法在三个领域适应方案中实现了最先进的性能,尤其是SIM10K到CityScapes方案的结果,从52.6地图提高到57.9地图。代码将发布。
translated by 谷歌翻译
The external visual inspections of rolling stock's underfloor equipment are currently being performed via human visual inspection. In this study, we attempt to partly automate visual inspection by investigating anomaly inspection algorithms that use image processing technology. As the railroad maintenance studies tend to have little anomaly data, unsupervised learning methods are usually preferred for anomaly detection; however, training cost and accuracy is still a challenge. Additionally, a researcher created anomalous images from normal images by adding noise, etc., but the anomalous targeted in this study is the rotation of piping cocks that was difficult to create using noise. Therefore, in this study, we propose a new method that uses style conversion via generative adversarial networks on three-dimensional computer graphics and imitates anomaly images to apply anomaly detection based on supervised learning. The geometry-consistent style conversion model was used to convert the image, and because of this the color and texture of the image were successfully made to imitate the real image while maintaining the anomalous shape. Using the generated anomaly images as supervised data, the anomaly detection model can be easily trained without complex adjustments and successfully detects anomalies.
translated by 谷歌翻译
In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induce human-like syntactic generalization. Specifically, we train language models (LMs) with and without these two components with the model sizes carefully controlled, and evaluate their syntactic generalization performance against six test circuits on the SyntaxGym benchmark. The results demonstrated that the composition function and the self-attention mechanism both play an important role to make LMs more human-like, and closer inspection of linguistic phenomenon implied that the composition function allowed syntactic features, but not semantic features, to percolate into subtree representations.
translated by 谷歌翻译
我们介绍了声学场景和事件的检测和分类的任务描述(DCASE)2022挑战任务2:“用于应用域通用技术的机器状况监控的无监督异常的声音检测(ASD)”。域转移是ASD系统应用的关键问题。由于域移位可以改变数据的声学特征,因此在源域中训练的模型对目标域的性能较差。在DCASE 2021挑战任务2中,我们组织了一个ASD任务来处理域移动。在此任务中,假定已知域移位的发生。但是,实际上,可能不会给出每个样本的域,并且域移位可能会隐含。在2022年的任务2中,我们专注于域泛化技术,这些技术检测异常,而不论域移动如何。具体而言,每个样品的域未在测试数据中给出,所有域仅允许一个阈值。我们将添加挑战结果和挑战提交截止日期后提交的分析。
translated by 谷歌翻译
本文旨在开发一种基于声学信号的无监督异常检测方法来自动机器监测。现有的方法,例如Deep AutoCoder(DAE),变异自动编码器(VAE),条件变异自动编码器(CVAE)等在潜在空间中的表示功能有限,因此,异常检测性能差。必须为每种不同类型的机器培训不同的模型,以准确执行异常检测任务。为了解决此问题,我们提出了一种新方法,称为层次条件变化自动编码器(HCVAE)。该方法利用有关工业设施的可用分类学等级知识来完善潜在空间表示。这些知识也有助于模型改善异常检测性能。我们通过使用适当的条件证明了单个HCVAE模型对不同类型机器的概括能力。此外,为了显示拟议方法的实用性,(i)我们在不同领域评估了HCVAE模型,(ii)我们检查了部分分层知识的影响。我们的结果表明,HCVAE方法验证了这两个点,并且在AUC得分度量上最大的15%在异常检测任务上的基线系统的表现优于基线系统。
translated by 谷歌翻译
A method to perform offline and online speaker diarization for an unlimited number of speakers is described in this paper. End-to-end neural diarization (EEND) has achieved overlap-aware speaker diarization by formulating it as a multi-label classification problem. It has also been extended for a flexible number of speakers by introducing speaker-wise attractors. However, the output number of speakers of attractor-based EEND is empirically capped; it cannot deal with cases where the number of speakers appearing during inference is higher than that during training because its speaker counting is trained in a fully supervised manner. Our method, EEND-GLA, solves this problem by introducing unsupervised clustering into attractor-based EEND. In the method, the input audio is first divided into short blocks, then attractor-based diarization is performed for each block, and finally, the results of each block are clustered on the basis of the similarity between locally-calculated attractors. While the number of output speakers is limited within each block, the total number of speakers estimated for the entire input can be higher than the limitation. To use EEND-GLA in an online manner, our method also extends the speaker-tracing buffer, which was originally proposed to enable online inference of conventional EEND. We introduce a block-wise buffer update to make the speaker-tracing buffer compatible with EEND-GLA. Finally, to improve online diarization, our method improves the buffer update method and revisits the variable chunk-size training of EEND. The experimental results demonstrate that EEND-GLA can perform speaker diarization of an unseen number of speakers in both offline and online inferences.
translated by 谷歌翻译
神经网络和相关的深度学习方法目前处于用于分类对象的技术的前沿。但是,他们通常需要大量的时间和模型培训数据。他们学到的模型有时很难解释。在本文中,我们推进了FastMAPSVM(用于对复杂对象进行分类的可解释的机器学习框架),这是用于通用分类任务的神经网络的有利替代方法。 FastMAPSVM通过组合FastMap和SVM的互补强度,将支持矢量机(SVM)(SVM)的适用性扩展到具有复杂对象的域。 FastMap是一种有效的线性时间算法,该算法将复杂的对象映射到欧几里得空间中的指向,同时保留它们之间的成对域特异性距离。我们证明了FastMAPSVM在分类地震图的背景下的效率和有效性。我们表明,就精确,回忆和准确性而言,其性能与其他最先进的方法相当。但是,与其他方法相比,FastMAPSVM对模型培训的时间和数据量明显较小。它还提供了对象及其之间的分类边界的明显可视化。我们希望FastMAPSVM可行对于许多其他实际域中的分类任务。
translated by 谷歌翻译
在使用深神经网络的现有图像分类系统中,图像分类所需的知识隐含在模型参数中。如果用户想更新此知识,则需要微调模型参数。此外,用户无法验证推理结果的有效性或评估知识对结果的贡献。在本文中,我们研究了一个存储图像分类知识的系统,例如图像特征图,标签和原始图像,而不是模型参数,而是在外部高容量存储中。我们的系统在对输入图像进行分类时,像数据库一样引用存储。为了增加知识,我们的系统会更新数据库,而不是微调模型参数,从而避免了在增量学习方案中灾难性的遗忘。我们重新访问一个KNN(K-Nearest邻居)分类器,并在我们的系统中使用它。通过分析KNN算法引用的邻域样本,我们可以解释过去如何将知识用于推理结果。我们的系统在ImageNet数据集上实现了79.8%的TOP-1精度,而在预处理后无需微调模型参数,而在任务增量学习设置中,在Split CIFAR-100数据集中获得了90.8%的精度。
translated by 谷歌翻译
拟声术语是语音上模仿声音的字符序列,在表达声音的特征,诸如持续时间,间距和Timbre的特征是有效的。我们提出了一种使用拟声缺陷的环境 - 辐射方法,以指定要提取的目标声音。利用这种方法,我们通过使用U-Net架构来估计来自输入混合谱图和拟声型的时频掩模,然后通过掩蔽频谱图来提取相应的目标声音。实验结果表明,该方法只能提取对应于拟声病的目标声音,并且比使用声音事件类别指定目标声音的传统方法更好地执行。
translated by 谷歌翻译
近年来,使用人工智能创造了广泛的投资模式。人工智能自动交易可以扩大交易方式的范围,例如通过授权每天24小时运行的能力以及以高频交易的能力。如果可以充分考虑过去的数据,也可以预期自动交易比使用更多信息交易。在本文中,我们提出了一种基于深度加强学习模型的投资代理,这是一个人工智能模型。该模型考虑了实际交易中涉及的交易成本,并在很长一段时间内创建交易的框架,以便它可以在单一贸易上进行大量利润。在这样做时,它可以最大限度地提高利润,同时保持交易成本低。此外,考虑到实际操作,我们使用在线学习,以便系统可以通过不断更新最新的在线数据而不是使用静态数据来继续学习。这使得可以通过始终纳入当前的市场趋势信息来贸易非静止金融市场。
translated by 谷歌翻译